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Abstract 

This paper generalizes asymptotic properties obtained in the 
observation-driven times series models considered by [7] in the sense 
that the conditional law of each observation is also permitted to depend 
on the parameter. The existence of ergodic solutions and the consis¬ 
tency of the Maximum Likelihood Estimator (MLE) are derived under 
easy-to-check conditions. The obtained conditions appear to apply for a 
wide class of models. We illustrate our results with specific observation- 
driven times series, including the recently introduced NBIN-GARCH 
and NM-GARCH models, demonstrating the consistency of the MLE 
for these two models. 
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1 Introduction 

Observation-driven time series models have been widely used in various dis¬ 
ciplines such as in economics, hnance, epidemiology, population dynamics, 
etc. These models have been introduced by [4] and later considered by [19], 
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[5], [11], [17], [9], [6] and [7]. The celebrated GARCH(1,1) model, see [2], as 
well as most of the models derived from this one, see [3] for a list of some of 
them, are typical examples of observation-driven models. Observation-driven 
models have the nice feature that the associated (conditional) likelihood and 
its derivatives are easy to compute and the prediction is straightforward. The 
consistency of the maximum likelihood estimator (in short, MLE) for the 
class of these models can be cumbersome, except when it can be derived us¬ 
ing computations specific to the studied model (the GARCH(1,1) case being 
one of the most celebrated example). When the observed variable is discrete, 
general consistency results have been obtained only recently in [6] or [7] (see 
also in [13] for the existence of stationary and ergodic solutions to some 
observation-driven time series models). However, the consistency result of 
[7] applies to some restricted class of models and does not cover the case 
where the distribution of the observations given the hidden variable also de¬ 
pends on an unknown parameter. We now introduce three simple examples, 
to which the results of [7] can not be directly applied. The first one is the neg¬ 
ative binomial integer-valued GARGH (NBIN-GARGH) model, which was 
first introduced by [20] as a generalization of the Poisson IN-GARCH model. 
The NBIN-GARGH model belongs to the class of integer-valued GARGH 
models that account for over dispersion (i.e., variability is larger than mean) 
and potential heavy tails in the high values. In [20], the author applied this 
model to treat the data of counts of poliomyelitis cases in the USA from 1970 
to 1983 reported by the Centres for Disease Control, where data overdisper¬ 
sion was detected. The estimation result showed that NBIN-GARCH(1,1) 
outperformed among some commonly used models such as Poisson and Dou¬ 
ble Poisson models. The NBIN-GARCH(1,1) model is formally defined as 
follows. 

Example 1 (NBIN-GARCH(1,1) model). Consider the following recursion. 

Afc_i_i = oj + aXk + bYk , 

where takes values in X = M_|_, takes values in Z_|_ and 9 = (w, a, b, r) € 
( 0 , 00 )^^ is an unknown parameter. In (I), MB{r,p) denotes the negative 
binomial distribution with parameters r > 0 and p G (0,1), that is: if T ~ 
J\fB{r,p), then P(y = k) = —pYp^ for all k >0, where T stands for 

the Gamma function. Though substantial analysis on this model has been 
carried out in the literature, to the best of our knowledge, the consistency 
of the MLE has not been treated, see the end of the discussions of Section 6 
in [20]. 

The second example is the univariate normal mixture GARGH model 
(NM-GARCH) proposed by [12] and later considered by [1]. The NM- 
GARCH model is another natural extension of GARGH processes, where 
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the usual Gaussian conditional distribution of the observations given the 
hidden volatility variable is replaced by a mixture of Gaussian distributions 
given a hidden vector volatility variable. The NM-GARGH model has the 
ability of capturing time variation in both conditional skewness and kurto- 
sis, while the classical GARCH cannot. In [1], the NM-GARCH(1,1) model 
was applied to examine the data of exchange rates consisting of daily prices 
in US dollars of three different currencies (British pound, euro and Japanese 
yen) from 2 January 1989 to 31 December 2002. The empirical evidence sug¬ 
gested the best performance of NM(2)-GARCH(1,1) when compared to the 
classical GARCH(1,1), standardized symmetric and skewed t-GARCH(l, 1) 
models applied to this same data. The definition of this model is formally 
stated as follows. 

Example 2 (NM(d)-GARCH(l, 1) model). Let d E N \ {0} and consider 
the following recursion. 


Xfc+i = a; + AXfc + , 

yfc+i|Xo;fc+i,yo:fc~G''(Xfc+i;-) 

dG®(x;-) A 


( 2 ) 


diy 


d 


X E ( 0 , 00 )"*, y E 


where u is the Lebesgue measure on M, X^ = [Ai,fc ... Xd^k]"^ takes values 
in X = 7 = [71 .. ■Jd]'^ a d-dimensional vector of mixture coefficients 

belonging to the d-dimensional simplex 



u, b are d-dimensional vector parameters with positive and non-negative 
entries, respectively and A is a d x d matrix parameter with non-negative 
entries. Here we have 6 = ( 7 , 0 ;, A, b). Note that depends on 9 only 
through the mixture coefficients 71 ,... , 7 ^. If d = 1, we obtain the usual 
conditionally Gaussian GARGH(1,1) process. In such a case, since 7 = 71 = 
1 , no longer depends on 6. Up to our knowledge, the usual consistency 
proof of the MLE for the GARCH cannot be directly adapted to this model. 

Finally, we consider the following new example, where a threshold is 
added to the usual INGARCH model in the conditional distribution. 


Example 3 (Threshold INGARCH model). Consider the following recur¬ 
sion. 


Xk+i — to + aXfc -|- hYk , 

hfc-|-l|A'o:fc+l, yO:fc ~ (-^fc+l A r) , 
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where Xk takes values in X = (0,oo), Yk takes values in and 9 = 
(w, a, b, t) G (0, oo)^ is an unknown parameter. Comparing with the usual 
INGARCH model, a threshold r has been added in the conditional observa¬ 
tion distribution. This corresponds to the practical case where the hidden 
variable has an influence on the observation up to this threshold. 

For a well-specified model, a classical approach to establish the consis¬ 
tency of the MLE generally involves two main steps: first the maximum 
likelihood estimator (MLE) converges to the maximizing set 0* of a limit 
criterion, and second the maximizing set indeed reduces to the true param¬ 
eter 0*, which is usually referred to as solving the identifiability problem. In 
this paper, we are interested in solving the problem involved in the first step, 
that is, the convergence of MLE. We extend the convergence result of MLE 
obtained in [7], which is valid for a restricted class of models, to a larger 
class of models in which the three examples introduced above are embed¬ 
ded. More precisely, we show the convergence of MLE in observation-driven 
models where the probability distributions of observations explicitly depend 
on the unknown parameters. Moreover, we provide very simple conditions 
that are easy to check, as shown by the three illustrating examples. 

The paper is organized as follows. Specific definitions and notation are 
introduced in Section 2. Then, Section 3 contains the main contribution of 
the paper, that is, sufficient conditions for the existence of ergodic solu¬ 
tions and for the consistency of the MLE. These results are then applied in 
Section 4 to the three examples introduced above. Numerical experiments 
for the NBIN-GARCH(1,1) model are given in Section 5. Einally, Section 6 
provides the proofs of the main results, mainly inspired from [7]. 

2 Definitions and notation 

Gonsider a bivariate stochastic process {(A^, Yk) : k £ Z+j on X x Y, where 
(X, d) is a complete and separable metric space endowed with the associated 
Borel (T-field X and (Y, T) is a Borel space. Let (0, A), the set of parameters, 
be a compact metric space, {G® : 9 G 0} be a family of probability kernels 
on X X T and {{x, y) ^ 'il)y{x) : 9 £ 0} be a family of measurable functions 
from (X X Y, to (X, A). The observation-driven time series model can 

be formally defined as follows. 

Definition 1. A time series {Y^ : k £ Z_|_} valued in Y is said to be 
distributed according to an observation-driven model with parameter 9 £ Q 
if there is a bivariate Markov chain {{Xk,Yk) : k £ Z+} on X x Y whose 
transition kernel satisfies 

K^i{x,y);dx'dy') = 6^e(^^-^{dx') G^{x'-,dy') , (5) 

where 6a denotes the Dirac mass at point a. Moreover, we will say that the 
observation-driven time series model is dominated by some cr-finite measure 
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V on (Y, 3^) if for all x € X, the probability kernel G^{x\ ■) is dominated by v. 
In this case we denote by g^{x] ■) its Radon-Nikodym derivative, g^{x;y) = 
(j/), and we always assume that for all (x, y) € X x Y and for all 

0 G 0, 

/(x;y) > 0 . 

A dominated parametric observation-driven model is thus characterized 
by the collection : 0 G 0}. The class of observation-driven time 

series models is a particular case of partially-observed Markov chains since 
only Yfc’s are observed, whereas X^’s are hidden variables. Note that our 
notation for observation-driven models is slightly different from that of [7] 
where their sequence {Tfc} corresponds to our sequence {Yfc_i}. Note also 
that the process {X^ '■ A: > 1} by itself is a Markov chain with transition 
kernel defined by 

R^x- A) = I 1a(<(x)) G^{x-, dy), x G X, A G A . ( 6 ) 

However, observation-driven time series models do not belong to the class of 
hidden Markov models. This can be seen in the following recursive relation, 
which holds for all /c > 0 , 

Xk+l = IpYki^k) , 

Yk+i\Rk-^G^iXk+i;-) , 

where ^ : i < k,£ £ Z+) and which can be represented 

graphically as below. 



Yfc '^k+l ^k+2 

Figure 1: Graphical representation of the observation-driven model. 


The most popular example is the GARCH(1,1) process, where G®(x;-) 
is a centered (say Gaussian) distribution with variance x and V'^(x) is an 
affine function of x and One can readily check that Examples 1 and 2 
are other instances of dominated observation-driven models. 

The inference about model parameter is carried out by relying on the 
conditional likelihood of the observations (li,..., Yn) given Xi = x for an 
arbitrary x G X. The corresponding conditional density function with respect 
to is, under parameter 0, for all x G X, 
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(8) 


yi-.n (V’^(yi:fc-i)(a;);yfc) , 

fc=i 

where, for any vector yi:p = (yi,..., yp) € Y^’, ip^{yi:p) is the X —>■ X function 
obtained as the successive composition of and ipy^, 

(9) 

with the convention {ys:t){x) = x for s > t. Then, the corresponding 
(conditional) Maximum Likelihood Estimator (MLE) 9x^n of the parameter 
6, is dehned by 

dx,n G argmaxL^„(Yi:„) , (10) 

eee 

where 

n 

Lln(yi:n) := n"^^ln/(^V''^(yi:fc_i)(x);yfc) . (11) 

k=l 

In this contribution, we study the convergence of 9x^n as n —>■ oo for some 
well-chosen value of x under the assumption that the model is well specified 
and the observations are in a steady state. This means that we assume 
that the observations {Y^ : k G Z+} are distributed according to with 
0* G 0, where, for all 0 G 0, P® denotes the stationary distribution of the 
observation-driven time series corresponding to the parameter 9. However 
whether such a distribution is well defined is not always obvious. We will 
use the following ergodicity assumption. 

(A-1) For all 0 G 0, the transition kernel of the complete chain admits a 
unique stationary distribution vr® on X x Y. 

With this assumption, we can now define P®. The following notation and 
dehnitions will be used throughout the paper. 

Definition 2. For any probability distribution /r on X x Y, we denote by 
Fp the distribution of the Markov chain {{Xk,Yk), A; > 0} with kernel 
and initial probability mesure //. Under Assumption (A-1), we denote by vr® 
and vr® the marginal distributions of tt^ on X and Y, respectively and by P® 
and P^ the probability distributions defined respectively as follows. 

a) P^ denotes the extension of P^e on the whole line (X x Y)^. 

b) P^ is the corresponding projection on the component Y^. 

The probability distributions P® and P^ are more formally defined by 
setting, for all m G Z and B G ^ 
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F (y^+^- X = P® (x^ X X 5 )) = F^e (x"*+^+ X s) , (12) 

or equivalently, using the canonical functions Y^, k ^"L, 

(h)n+l:oo € -B) = P^ {Ym+l-.oo (z B) = F^g (Ym+l-.oo ^ B) . (13) 

Here and in what follows, we abusively use the same notation both for the 
canonical projection defined on and for the one defined on (X x Y)^+. We 
also use the symbols E® and E^ to denote the expectations corresponding to 
P® and F, respectively. 


3 Main results 


3.1 Preliminaries 


In this section, we follow the same lines as in [7] to derive the convergence of 
the MLE Ox,n for a general class of observation-driven models. The approach 
is to establish that, as the number of observations n —>■ oo, there exists a 
(Y^, T®^) —>■ (M, i3(M)) measurable function such that the normalized 

log-likelihood L® defined in (11), for some appropriate value of x, can 

be approximated by 

n 

^ln/(Yfc|y_oo:fc-l) . 
k=l 


To define {■]■), we set, for all y-oo-.i £ Y^“, whenever the following limit 
is well defined. 


/ (yi I y-oo-.o) 


lim g 

m—>-oo 




oo 


if the limit exists, 

(14) 

otherwise. 


By (A-1), the process Y is ergodic under P®* and provided that 




in+/(yi|y_oo:o) 


< oo 


it follows that 


lim L®„(yi:„) =E^* lnp®(yi|y_oo:o) 

n^oo ’ L 


F*-a.s. 


In this paper we show that with probability tending to one, the MLE 9x^r. 
eventually lies in a neighborhood of the set 


0* = argmaxE®* lnp®(yi|y_oo:o) 

eee l 


( 15 ) 
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which only depends on 0*. In this contribution, we provide easy-to-check 
sufficient conditions implying 

lim A(0a;„,0*) = 0, P®*-a.s., (16) 

n—>-oo 


but, for the sake of brevity, we do not precisely determine the set 0*. Many 
approaches have been proposed to investigate this problem, which is often 
referred to as the identifiability problem. In particular cases, one can prove 
that 0* = {0*}, in which case the strong consistency of the MLE follows 
from (16). We will mention a general result which precises how the set 0* 
is related to the true parameter 0* in Remark 3. For the moment, let us 
mention that we have 

G 0* , (17) 

provided that the following assumption holds: 

(B-1) For all 0,0* € 0, we have 

(i) If 0 0*, y !-)• p®(y|F-oo:o) is a density function ]P®*-a.s. 

(ii) Under P®*, the function y i-)- p®*(y|y_oo:o) is the conditional density 
function of Yi given Y^co-.o- 


Indeed, (17) follows by writing for all 0 G 0, 




= E' 


in/*(yi|y_oo:o) - in/(yi|y_oo:o) 
/*(yi|y_oo;o) 


= E^ 




In 


p«(yi|y_oo:o) 


y_ 


cxd:0 


inrllhiyAl 


which is nonnegative under (B-1) since it is the expectation of a conditional 
Kullback-Leibler divergence. 


3.2 Convergence of the MLE 

In this part, we always assume that (A-1) holds. The following is a list of 
additional assumptions on which our convergence result relies. 

(A-2) There exists a function R : X —> M+ such that, for all 0 € 0, 7rf(y) < 00 . 

Remark 1. Assumption (A-2) is usually obtained as a byproduct of the 
proof of Assumption (A-1), see Section 3.3. It is here stated as an assumption 
for convenience. 

The following set of conditions can readily be checked on and . 

(B-2) For all y € Y, the function {9,x) i-7> g^{x;y) is continuous on 0 x X. 
(B-3) For all y G Y, the function (0,x) 1 -^ is continuous on 0 x X. 













The function V appearing in (B-4)(viii) below is the same one as in Assump¬ 
tion (A-2). Moreover, in this condition and throughout the paper we write 
/ < 1/ for a real-valued function / and a nonnegative function V dehned 
on the same space X, whenever there exists a positive constant c such that 
|/(x)| < cV{x) for all x G X. 

(B-4) There exist xi € X, a closed set Xi C X, ^ G (0,1), C > 0 and measurable 
functions r/i : Xi ^ M+, H : M+ —> M+ and : Y —> M+ such that the 
following assertions hold. 

(i) For all 0 G 0 and {x,y) G X x Y, i/jyix) G Xi. 

(ii) sup q^(x: y) < oo. 

(e,x,y)£exXixY 

(iii) For all 0 G 0, n G x G X, and yi:n G Y”, 

d < 0 ^ Vi(x) , (18) 


(iv) ijj is locally bounded. 

(v) For all 0 G 0 and ?/ G Y, ^('(/;®(xi)) < ^{y). 

(vi) For all 0 G 0 and (x,x',y) G Xi x Xi x Y, 


, 9^{x-,y) 

^ g^{x';y) 


< H{d{x,x')) e^0(xi,x)vd(xiy)) 


(19) 


(vii) H{u) = 0(u) as u —^ 0. 

(viii) If C = 0, then, for all 0 G 0, 

G^ln+^<V, ( 20 ) 

otherwise, for all 0 G 0, 

GU<V. ( 21 ) 


Let us now state our main result as follows. 

Theorem 3. Assume that (A.-1), (A-2), (^-2), (B-3) and (B-4) hold. Then, 
letting xi G X as in (B-4), the function defined by (14) with x = xi 

satisfies (B-1) and the convergence (16) of the MLE holds with the set 0* 
defined by (15). 

For convenience, the proof is postponed to Section 6.1. 

Remark 2. As noticed in [7], the techniques used to prove Theorem 3 also 
apply in the misspecified case, where Y is not distributed according to P®*. 
We do not pursue in this direction in this contribution. 
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The consistency of the MLE then follows from Theorem 3 by the follow¬ 
ing remark. 

Remark 3. In many specihc cases, one can show that 0* defined by (15) 
is the singleton {6**}. However this task appears to be quite difficult in 
some cases such as Example 3. Instead one can use [8, Section 4.2], where 
it is shown that the assumptions of Theorem 3 imply that 0* is exactly 
the set of parameters 9 such that P® = P®*. Thus we can conclude that 
the MLE converges to the equivalence class of the true parameter. This 
type of consistency has been introduced by [14] in the context of hidden 
Markov models in order to disentangle the proof of the consistency from the 
problem of identifiability. Recall that the model is identifiable if and only 
if the equivalent classes {9 : P^ = P^*} reduce to singletons {0*} for all 

0*e0. 


3.3 Ergodicity 

In this section, the observation-driven model is studied to prove the condi¬ 
tion (A-1). Since this is a “for all 9 (...)” condition, to save space and alleviate 
the notational burden, we will drop the superscript 9 from, for example, G®, 
and and respectively write G, R and V’j instead. 

Ergodicity of Markov chains are usually studied using ■i/’-irreducibility. 
This approach is well known to be quite efficient when dealing with fully 
dominated models, see [15]. It is not at all the same picture for observation- 
driven models, where other tools need to be invoked, see [10, 7]. Since the 
ergodicity is studied for a given parameter 0, the ergodicity results of [7] 
directly apply, even though observation-driven models are restricted to the 
case where g does not depend on the unknown parameter 9 in this reference. 
Our main contribution here is to focus on an easy-to-check list of assump¬ 
tions yielding the ergodicity conditions (A-1) and (A-2). We also provide a 
lemma (Lemma 5) which gives the construction of the instrumental func¬ 
tions a and (j) used in the list of assumptions. 

(A-3) The measurable space (X, d) is a locally compact, complete and separable 
metric space and its associated u-field X is the Borel cj-field. 

(A-4) There exist (A,/3) G (0,1) x M_|_ and a measurable function E : X — )■ M_|_ 
such that RV < XV -|- (3 and {V < M} is compact for any M > 0. 

(A-5) The Markov kernel R is weak Feller, that is, for any continuous and 
bounded function / defined on X, Rf is continuous and bounded on 
X. 

(A-6) The Markov kernel R has a reachable point, that is, there exists xq G X 
such that, for any x G X and any neighborhood Af of xq, R^{x]Af) > 0 
for at least one positive integer m. 
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(A-7) We have 


(x,x',j/)gX^xY d(x,x) 

x^x' 


< 1 . 


(A-8) There exist a measurable function a from to [0,1], a measurable func¬ 
tion (/) : > X and a measurable function fT : X^ —)■ [l,oo) such that 

the following assertions hold. 

(i) For all (x,x') E X^ and y E Y, 

min{y(x;y),y(x';y)} > Q;(x,x')y ((/>(x,x');y) ■ (22) 


(ii) For all x E X, ff^(x,') is finitely bounded in a neighborhood of x, 
that is, there exists 7 a; > 0 such that sup fy(x,x') < oo. 

a:'GB(a;, 72 ;) 

(iii) For all (x,x') E X^, 1 — a{x,x') < d(x, x')FF(x, x'). 

(iv) sup tT(V'y(x), V'y(x')) G((/)(x,x');dy) - fy(x,x')^ < oo, 

where the sup is taken over a II (x, x') E X^. 


We can now state the main ergodicity result. 

Theorem 4. Conditions (A-3), (f^-4), (^-3), (^-6), (A-7j and (A-8) imply 
that K admits a unique stationary distribution vr onXxY. Moreover -riV < 
oo for every F : X —>■ M+ such that V <V. 

The proof of Theorem 4 is postponed to Section 6.2 for convenience. 
The first conclusion of Theorem 4 can directly be applied for all 0 E 0 
to check (A-1). The second conclusion can be used to check (A-2). In doing 
so, one must take care of the fact that although V may depend on 9, V does 
not. 

Assumptions (A-4), (A-5) and (A-6) have to be checked directly on the 
Markov kernel R defined by (6). To this end it can be useful to define, for 
any given x E X, the distribution 

Pa; •= P52,(g)G(a;;') (23) 

on (X X Y)^+, where is defined for any distribution // on X x Y as in 
Definition 2. Then the first component process k E Z_|_} associated to 
Pa; is a Markov chain with Markov kernel R and initial distribution 5x- 
We now provide a general framework for constructing a and (f that 
appear in (A-8). 

Lemma 5. Suppose that X = for some measurable space (S,5) and 
C C M. Thus for all x G X, we write x = (xs)sgS! where x* E C for all 
s E S. Suppose moreover that for all x = (xs)sgS £ X, we can express the 
conditional density y(x; •) as a mixture of densities of the form j{xs)h{xs] •) 
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over s G S. This means that for all t G Q, y j{t)h{t',y) is a density with 
respect to v and there exists a probability measure y on (S,5) such that 



(24) 


We moreover assume that h takes non-negative values and that one of the 
two following assumptions holds. 

(F-1) For all y gY, the function h{-]y) : 1 1 -)- h{t;y) is non-decreasing. 

(F-2) For all y € Y, the function h{-]y) : 1 1 -)- h{t;y) is non-increasing. 

For all {x,x') G denoting x A x' := (min{xs, CLn^d x M x' : = 

(max{xs, x' })sgs, we define a{x,x') and 4>{x,x') as 



and 4i{x,x') = xAx' under (F-1) ; 


a(x,x') = inf < ^ > and 4‘{x.,x') = x V x' under (F-2) . 

I o/'T* \/ nr'' i I 


Then a and (p defined above satisfy (A-8)(i). 

Proof. We only prove this result under Condition (F-1). The proof is similar 
under (F-2). 

Since for all t G C, y t-A j{t)h(t-, y) is a density with respect to n, we have 



Thus j is non-increasing on C. Clearly, the defined a takes values on [0,1] 
and (p defines a function from to X. For all (x, x') G X? and y G Y, we 
have 



= a{x,x')g{(p{x,x');y) . 

By symmetry of a and (p, we get (22) and thus (A-8)(i) holds. 


□ 
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4 Examples 


Let us now apply these results to prove the convergence of MLE of Exam¬ 
ples 1, 2 and 3. 

4.1 NBIN-GARCH model 

Example 1 is a specific case of Definition 1 where v is the counting measure 
on Y = N, 


V'y(x) = 

9\x-,y) 


u! + ax + by , 

^ T{y + r) ( 1 \ 

y!r(r) Vl-hxy 


r 


X 

TTi) 


(25) 

(26) 


with 6 = (cj, a, b, r) in a compact subset 0 of (0, oo)*^ and X = (0, oo). 

In [20, Theorem 1], the equation satisfied by the mean of the observations 
fik = lE[Yfc] is derived and is shown to admit a constant solution if and only 
if 

-|- a < 1 . (27) 

This clearly implies that this condition is necessary to have a stationary 
solution {Tfc} with finite mean. However it does not imply the existence of 
such a solution. In fact, the following result shows that (27) is indeed a 
necessary and sufficient condition to have a stationary solution {Y^} with 
finite mean. It also shows that all the assumptions of Theorem 3 hold, which, 
with Remark 3, provides the consistency of the MLE 6xi,n for any xi G X. 

Theorem 6. Suppose that all 9 = {L 0 ,a,b,r) in 0 satisfy Condition (27). 
Then Assumptions (A-ij, (^-2), (B-5j and (^-4) hold with V being 

defined as the identity function on X and with any xi € X. 


Proof. For convenience, we divide the proof into two steps. 

Step 1. We first prove Assumptions (A-1) and (A-2) by applying Theorem 4. 
We set V{x) = V{x) = x and thus we only need to check (A-3), (A-4), (A-5), 
(A-6), (A-7) and (A-8). Condition (A-3) holds. We have for all 9 e 0, 


RV (x) = w -|- (a -|- br)x = (o -I- br)V{x) -t- w, 

which yields (A-4). The fact that the kernel R is weak Feller easily follows 
by observing that, as p —)• p', MB{r,p) converges weakly to MB{r,p'), so 
(A-5) holds. 

We now prove (A-6). Let Xoo = uj/{l—a). Let x G M and define recursively 
the sequence xq = x, x^ = w + ax^-i for all positive integers k. Since 0 < 
a < 1, this sequence converges to the fixed point Xqo- Therefore, defining 
as in (23), for any neighborhood M of Xoo, there exists some n such that 
Xn & Af and we have 


13 





R^{x; Af) = Pa; {Xji € Af) > Pa; {Xk = Xk for all A: = 1,..., n) 

= Pa; (Pb = • • • = P^i-i = 0) > 0. 


So (A-6) holds. Assumption (A-7) holds since we have for all {x,x',y) G 
X Y with X 7 ^ x', 

- i’yix')\ __ 

I /1 - ^ ^ ‘ 

\x — x'\ 

To prove (A-8), we apply Lemma 5 with C = X, S = {1} (so /r boils down 
to the Dirac measure on {!}). For all {x,y) G X x Y, let j{x) = 

h{x; y) = ^ satisfies (F-1). Thus by Lemma 5, for all 

(x, x') G X^ and y G Y, we get that 

, /l + xAx'\^ 1 // /\ / 

alx, X = -- G (0,1 and d>{x, x ) = x A x 

^ ^ yi + xVx'J ^ ‘ v-v > y 

satisfy (A-8)(i). For any given r > 0, let a function VF : X^ —>■ [l,oo) be 
defined by, for all (x,x') G X^, VF(x,x') = 1 V r. By definition of W, as a 
constant function, (A-8)(ii) and (A-8)(iv) clearly hold. Moreover, (A-8) (hi) 
holds since for all {x,x') G X^, we have that 

1 — a{x,x') < (1 V r)|x — x'\ = VF(x,x')|x — x'\ . 


Therefore, (A- 8 ) holds, which completes Step 1. 

Step 2. We now prove (B-2), (B-3) and (B-4). By assumption on 0, then 
there exists (w, w, 6 , 6 , r, f, a, d) G (0,oo)® x (0,1)^ such that 


u < u} < ui, < b < b, r<r<r, a< a + br < a . 

Clearly, (B-2) and (B-3) hold by definitions of 'ijjy{x) and g^{x;y). It remains 
to check (B-4) for a well-chosen closed subset Xi and any xi G X. Let Xi = 
[w, oo) C X so that (B-4)(i) holds. By noting that for all {9,x,y) G 0 x X x Y, 
g^{x]y) < 1, we have (B-4)(ii). From (9) and (25), we have for all s < t, 
Vs-.t £ X G X and 0 G 0, 

( 1 ^_sH”l \ ^ ^ 

—-j ^ a^yt-j ■ (28) 

Using (28), we have, for all 0 G 0, x G X and yi-n G Y"", 


{yi-.n){xi) - {yi,n){x) 


= a"' |xi — x| < a"' |xi — X 
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This gives (B-4)(iii) and (B-4)(iv) by setting ^> = ct < 1 and = \xi — 
x\. Next we set (^, H and C to meet Conditions (B-4)(v) and (B-4)(vi) 
and (B-4)(vii). Let us write, for all 0 G 0 and y €Y, 


xi - iplixi) 


< w + (1 + a)xi + by < u! + {1 + a)xi + by 


and, for all (x, x') 


G Xf = [w, oo)^. 


, 9\x;y) 

^ g^{x'-,y) 


= \ {r + y) [ln(l + x') — ln(l + x)] + y [inx — Inx'] | 

< [(r + y)(l + |x-x'| 

< [r + y (1 + w“^)] |x — x'l . 


Setting (^(y) = wVf+(l+d)xi + (6 V (1 + w“^)) y, H{x) = x and (7 = 0 then 
yield Conditions (B-4)(v), (B-4)(vi) and (B-4)(vii). Now (B-4)(viii) follows 
from 

J In’*" y G^{x, dy) — J 9 dy) = rx < rV (x) . 

This concludes the proof. □ 


4.2 NM-GARCH model 

The NM(d)-GARCH(l, 1) of Example 2 is a specihc case of Definition 1 
where X = and u is the Lebesgue measure on Y = R, 


=^ +Ax + y^b , 




(x,y) € X X Y , 


(29) 

(30) 


and Q = ( 7 , 0 ;, A, b) G 0, a compact subset of x ( 0 , 00 )“^ x x R!|_, 
with Pd defined by (3). 

In [12], it is shown that the equation satisfied by the variance of a uni¬ 
variate NM(d)-GARCH(l, 1) process admits a constant solution if and only 
if 

|A|max(A-Fb 7 ^) < 1 , (31) 

where, for any square matrix M, |A| ma y(M) denotes the spectral radius of 
M. It follows that the existence of a weakly stationary solution implies (31) 
but it does not say anything about the existence of stationary or weakly 
stationary solution. The result below shows that (31) is indeed a sufficient 
condition for the existence of a stationary solution with finite variance. It 
moreover provides with Theorem 3 and Remark 3 the consistency of the 
MLE 0 X 1 ,n for any xi G X. 
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Theorem 7. Suppose that all 6 = ( 7 , 0 ;, A, b) in 0 satisfy Condition (31). 
Then Assumptions (1^-2), (B-2), (B-3J and (B-f) hold with V being 

defined as any norm on X. 

Proof. In this proof section, we set 

d 

i/(x) = |x| = , (32) 

i=i 

for all X = (x^) € X. As in Theorem 6 , we divide the proof into two steps. 
Step 1. We first show that Assumptions (A-1) and (A-2) hold with the 
above V by applying Theorem 4. Define V on X by setting 

l/(x) = (1 + Xo)^X , 

where 1 is the vector of X with all entries equal to 1 and xq is dehned by 
l + xo = (I-(A + b7^)^)-4. 

We indeed note that by Condition (31) the above inversion is well defined 
and moreover 

(I - (A + b 7 '^)'^)"^ = ^ + X] ’ 

k>l 

and, since A, b, 7 all have non-negative entries, it follows that xg has non¬ 
negative entries. Thus, for all x = (xi) € X, 

C(x) = l^x < C(x) , 

so that V < V. Hence by Theorem 4, we thus only need to check (A-3), 
(A-4), (A-5), (A- 6 ), (A-7) and (A- 8 ) with V defined as above for a given 
9 = ( 7 , u, A, b) G 0 (so we drop 9 in the notation in the remaining of Step 
1). Condition (A-3) holds for any metric d associated to a norm on the finite 
dimensional space X. (The precise choice of d is postponed to the verification 
of (A-7).) We have 

RV{x) = j V{u + Ax + y^h) G(x, dy) 

= (1 -b xo)^cJ + (1 + xq)^ (a -b b7^) X 
= V{u) + 1^(1 - (A + b 7 ^) )-^ (A + b 7 ^ - I + I) X 
= V{u) + XqX 
< H(u;) + AH(x) , 

where we set A = max£{xo//(l b-xg/)} < 1. Hence (A-4) holds. Condition 
(A-5) easily follows from the continuity of the Gaussian distribution with 
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respect to its variance parameter. We now prove (A- 6 ). Prom (9) and (29), 
we have for all n > 1, yo-.n-i € and x G X, 


n—1 


'0^(2/O:n-l)(x) = A”x + ^A^(^ + 7/2_i_jb) . 

j=0 


Let us use the norm 


(33) 


M|| — nictx ^ 1 j I — sup I 

^ i ’ |X|<1 


on d X d matrices. Note that by (31), there exists 5 G (0,1) and c > 0 such 
that, for any fc > 1 , 


(A + b7^)" 


< c(5^ . 


(34) 


Using that A, b, 7 all have nonnegative entries, we have 


< 


(A + b 7 ^) 


(35) 


Hence (I —A) ^ = I + X]fc>i defined and we set Xqo = (I — A) ^(jj 

so that, with (29), we have 

n—1 

V'^(l/0:n-l)(x) -Xoo = A’^x + ^ A^o^ + ^ A% . 

j>n j=0 

Then, using definition (23), we get that, Px-a.s., for all n > 1, 


XqoI — IV^ (Lo:n,— 1 ) (x) Xoo| 

< |A"-(x - Xoo)I + ^ |A-^'<^| + 
j>n 


( max 
Y0<j<n-1 



With (34) and (35), this implies 


Xqo I ^ C 


X - Xoo + 


\U)\ 


+ 


Ib| 


max Yf 


1 — 5 J 1 — (5 0<j<n-l ^ 


= 1 . 


To obtain (A- 6 ), it is sufficient to observe that, since g takes positive values 
in (30), for any positive e, x G X and any n > 1, 


max Y~ < e] > 0 . 


v 0 <i<n-l 

Next we prove (A-7). We have 

^py{x) - Ipyix) = A(x - x') 
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Since (34) and (35) imply that |A|max(A) < 1, there exists a vector norm 
which makes A strictly contracting. Choosing the metric d on X as the 
one derived from this norm, we get (A-7). To show (A- 8 ), we again rely 
on Lemma 5. Let us set C = (0, oo) and S = d} and define the 

probability measure /r on S by /r({s}) = 7 ^, for all s € S. For all (t,y) € 
CxY, let j{t) = ( 27 rt)i /2 h{t;y) = exp (—y^/2t). Obviously, Relation (24) 
holds and h satisfies (F-1). Hence, Lemma 5 implies that a and </> defined 
respectively for all x = (xi,..., Xd), x' = (x'^,..., x'^) G X by 


a(x,xM= min 
l<£<d 


/xe A x'^ 
\xe V x^ 


G (0,1] and (^(x, x') = (xi Ax'^,..., XdAx'^), 


satisfy (A-8)(i). For x = (xi,... ,Xd), x' = 


1 — q;(x, x') = 1 — 


mm ' 
l<i<d 


(x'^,..., x'^) G X, we have 

^ \xe-x'^\ y\ 

xeW x'g J j 


f |X£ - x^ 

< max <- p 

l<£<d [ XiW X^ 

< min (x7^ A x(~^) lx — x 

l<£<d I ' < 


< lF(x, x') d(x, x') , 


where d is the metric previously defined and W is defined by VF(x, x') = 
1 V (cd mini<^<d(x^^ A x'f~^)) with Cd > 0 is conveniently chosen (such a 
constant exists since d is the metric associated to a norm and X has finite 
dimension). Then (A-8)(ii) and (A-8)(hi) hold and, since for all y G Y and 
X G X, 'tpy{x) has all its entries bounded from below by the positive entries 
of u, W{'tpy{x),'ipy{x')) is uniformly bounded over (x, x',y) G X x X x Y 
and (A-8)(iv) holds. This completes Step 1. 

Step 2 We now show that Assumptions (B-2), (B-3) and (B-4) hold. 

Clearly, (B-2) and (B-3) hold by definitions of ipyix) and y^(x;y). It 
remains to show (B-4). Since 0 is compact, then 

w < min uji, |a^| < w, 6 < |b| < b, |A|max(A -|- b7'^) < p, || A -|- b7'^|| < L 


for some (w, tJ, 6, 5, p) G (0, 00 )'^ x (0,1) and L > 0. By [16, Lemma 12], we 
note that this implies that, for all 5 G (p, 1), there exists C > 0 such that 
for all /c > 1 and all 9 G Q, 


(A -F b7’^)^ 


< C6^ . 


(36) 


We set Xi = [^,00)^^ C X so that (B-4)(i) holds. Moreover, for all {6,x,y) G 
0 X Xi X Y, g^{x-,y) < Thus, Condition (B-4)(ii) holds. Now let 
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xi G X. Using (33), (36) and (35), we have, for all x G X, yi-n £ Y"" and 
0 G 0, 


^®(yi:n)(xi) - V’'^(yi:„)(x) = |A”(xi - x) | 

< C5'^ |xi -x| . 

Using that the norm defining d is equivalent to the norm | • |, we get (B-4)(iii) 
with 

Vi(x) = C' |xi - x| , 

for some positive constant C'. Hence (B-4)(iv) holds and since 
xi-V'®(xi) < (L +1) |xi|+ , 

we also get (B-4)(v) provided that 

^{y)>{L + l)\y.i\+u + y%. ( 37 ) 

It is straightforward to show that, for all 0 G 0, x G Xi, y G M, and 

£g {!,...,d}. 


9 In 


dxf 


(x;y) 


- 2\iJ‘ w 


Thus, by the mean value theorem, for all 0 G 0, (x, x') G Xi x Xi and y G Y, 
ln/(x;y) - ln/(x';y) + |x-x'|. 

We thus obtain (B-4)(v), (B-4)(vi) and (B-4)(vii) by setting (7 = 0, 


H{u) = sup |x — x'l , 

d(x,x')<w 


and 


(j){y) = (L + 1 ) |xi I + a; + 1 /( 2 w) + y‘^{b + w^) . 
In addition, for all 0 G 0 and x G X, we have 


y2G^(x,dy)=7''x. 


Hence, using (32) with the above definitions, we obtain (B-4)(viii) and the 
proof is concluded. □ 
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4.3 The Threshold INGARCH model 


The threshold INGARCH(1,1) in Example 3 is a specific case of Definition 1 
where v is the counting measure on Y = Z_|_, 

il)y{x)=uj + ax + hy, (38) 

(39) 

with 9 = {u,a,b,T) in a compact subset 0 of (0, oo)^ and X = (0,oo). In 
this model, if a < 1 , we then have the ergodicity and consistency results as 
stated in Theorem 8 below. 

Theorem 8. Suppose that all 0 = {co,a,b,T) in 0 satisfy a < 1. Then 
Assumptions (A.-1), (k-2), (^-2), (^-3) and (B-f) hold with V being defined 
as the identity function on X and with any xi € X. 

Proof. As in the proofs of the two theorems above, for convenience, we divide 
the proof into two steps. 

Step 1. We first prove Assumptions (A-1) and (A-2) by applying Theorem 4. 
We set V{x) = V{x) = x and thus we only need to check (A-3), (A-4), (A-5), 
(A- 6 ), (A-7) and (A- 8 ). Condition (A-3) holds with the usual metric on M. 
We have for all 0 € 0, 

RV{x) = uj + ax + b{x At) < aV(x) -|- (w -|- br), 

which yields (A-4). The fact that the kernel R is weak Feller easily follows 
by observing that, as x —>■ x', V{x) converges weakly to V{x') and the map 
X !->■ X A r is continuous, so (A-5) holds. 

The proof of (A- 6 ) is similar to the NBIN-GARCH case of Theorem 6 
and is thus omitted. Assumption (A-7) holds since we have for all (x, x', y) G 
X^ X Y with X / x', 

\fiy{x) - fiy{x')\ ^ 

I / I - ^ ‘ 

|x — x'\ 

To prove (A- 8 ), we apply Lemma 5 with C = X, S = {1} (so /r boils down 
to the Dirac measure on {!}). For all {x,y) € X x Y, let j{x) = and 

h{x]y) = Then h indeed satisfies (F-1). Thus by Lemma 5, for all 

(x, x') G X^ and y G Y, we get that 

a(x, x') = e-(^Vx')Ar+(xAx')Ar ^ x') = X A x' 

satisfy (A- 8 )(i). 

Let IT(x,x') = 1 for all (x,x') G X^, which is a constant function. Thus 
(A- 8 )(ii) and (A- 8 )(iv) clearly hold. Moreover, (A- 8 )(iii) holds since for all 
(x, x') G X^, we have that 

1 — a(x, x') < X V x' — X A x' = |x — x'l = VL(x, x')|x — x'\ . 
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Therefore, (A-8) holds, which completes Step 1. 

Step 2. We now prove (B-2), (B-3) and (B-4). By assumption on 0, then 
there exists (w, w, 6, 6, r, f, a, d) G (0, oo)® x (0,1)^ such that 

^ < uj < O, < b <b, T < T < f, a < a < a . 

Clearly, (B-2) and (B-3) hold by definitions of 'il’y{x) and g^{x]y). It remains 
to check (B-4) for a well-chosen closed subset Xi and any xi G X. Let Xi = 
[w, oo) C X so that (B-4)(i) holds. By noting that for all {9,x,y) G 0 x X x Y, 
g^{x]y) < 1, we have (B-4)(ii). From (9) and (38), we have for all s < t, 
Vs-.t £ X G X and 6 G Q, 

( -1 ^\ i s 

—-j -f + b'^ a^yt-j ■ (40) 

“ ^ j=0 


Using (40), we have, for all 0 G 0, x G X and yi-n G Y"", 

'4^^{yi-.n){xi) - 4^^{yi-n){x) = a"- |xi - x| < d"- |xi - x| 


This gives (B-4)(hi) and (B-4)(iv) by setting q = a < 1 and ^(x) = |xi — 
x|. Next we set H and C to meet Conditions (B-4)(v) and (B-4)(vi) 
and (B-4)(vii). Let us write, for all 0 G 0 and y gY, 


xi - 4^l{xi) 


< w -I- (1 -I- a)xi -I- < a; -I- (1 + a)xi + by 


and, for all (x,x') G Xf = [w, oo)^. 


In g^(x; y) — \u.g^{x'] y) = \ {x' hr — x hr) + y (ln(x At) — ln(x' A r)) | 

< (l + (w A T)“^y) |x — x'\ . 


Setting 0(y) = l-|-a;-|-(l-|-d)xi -|-(6 V (w A r)“^) y, H{x) = x and C = 0 then 
yield Conditions (B-4)(v), (B-4)(vi) and (B-4)(vii). Now (B-4)(viii) follows 
from 

J In’*' y G^{x, dy) — J V dy) = x At <V{x) . 

This concludes the proof. □ 


5 Numerical experiments 

5.1 Numerical procedure 

In this part we provide a numerical method for computing the (conditional) 
MLE 6x,n for the parameter 9 = (cu, a, b, r) in the NBIN-GARCH(1,1) model 
introduced in Example 1 and studied in Section 4.1. It is convenient to write 
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9 = (i?, r) with -d = {(jj,a,b) and then to write ipy{x) and g^{x]y) instead 
of '4’y{x) and g^{x]y) in (25) and (26), respectively. In contrast to the ap¬ 
proach used in [20], we allow the component r to be any positive real number, 
rather than a discrete one and to be unknown as well. We thus maximize 
jointly with respect to the parameters i? and r the log-likelihood function 
\J^n{vx.n) = \-^x,n\yi-.n)■ In practice, one does not rely on a compact set 0 
of parameters as in Theorem 6. Instead the maximization is performed over 
all parameters w>0,a>0,6>0,r>0 such that the stability con¬ 
straint a-|-6r < I holds (taken from (27)). We use the constrained nonlinear 
optimization function auglag (Augmented Lagrangian Minimization Algo¬ 
rithm) from the package alabama (Augmented Lagrangian Adaptive Barrier 
Minimization Algorithm) in R. For this purpose we provide an initial pa¬ 
rameter point and a numerical computation of the normalized log-likelihood 
function L® and of its gradient. The initial point is obtained by ap¬ 

plying a conditional least square (CLS) estimation based on an ARMA(1,1) 
representation of the model, see [20, Section 3]. The computation of the 
log-likelihood and of its derivatives are derived as follows. For all x G X, 
denoting {yi-k-i){x) for all A: > 2 and = x, we have 

n 

^^x,nHyi-.n) =n~^^\ng^ (V''^(2/i:fc-i)(a^); 2/fc) 
k=\ 

n 

= n"Mn/(x,yi)-bn"^^ln/. 

k=2 


The computation of for all A: > 2 is done iteratively by observing that 

uf, = {yk-i){u^_i) and the computation of \-^:^,n\yi-.n) is deduced. The 

computation of the derivatives with respect to parameter 9 = {'9, r) of the 
function \-^x,n\yi-.n) are then obtained in two steps. First, for k > 2, the 
derivative of uf, with respect to '9 are obtained iteratively by dui/d9 = 0 
and 





bfc_i) -|- a 


dd 


Then the derivatives of L 


('l9,r) 

x^n 


{yi:n) with respect to 9 and r are given by 





yk + r \ dul 
I+ ul) d‘9 


and 



= n (^ 2 (r + yk) - ln(l -b uf)) - r 2 (r) , 

k=l 
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respectively, where r 2 is the digamma function r 2 (r') = ^lnor(r), r > 0. 

5.2 Simulation study 

We consider two NBIN-GARCH(1,1) models with parameters: 

(M.l) = (3, .2, .2, 2) and 

(M.2) 0* = = (3, .35, .1, 1.5). 

We simulated m = 200 data sets for each sample size n = 2^, 2®, 2® and 
2^*^. In Figure 2, we display the obtained boxplots of the difference of the 
normalized log-likelihood function evaluated respectively at MLE and at the 
true value 0*. As predicted by the theory, this difference appears to converge 
to 0 as the number of observations n —>■ oo. For the NBIN-GARCH(1,1) 
model, it can be shown that 0* = {0*}, which implies the convergence 
of the MLE to the true parameter. We can observe this behavior for each 
component of the MLE for the two models in Eigure 3 and Eigure 4. We 
also report the Monte Garlo mean along with the mean absolute deviation 
error (MADE): MADE = m~^ YlJLi \Gx,n — 0j\ as an evaluation criterion 
for the estimated parameter in Table 1. 


Table 1: Mean of estimates, MADEs (within parentheses) for the NBIN- 
GARGH(1,1) models 





Sample 

size n 


Model 

Parameter 

n = 2^ 

n = 2« 

n = 2^ 

n = 2 ^u 


CO 

3.311(.973) 

3.212(.719) 

3.108(.507) 

3.062(.372) 

(M.l) 

a 

i) 

.165(.138) 

.194(.049) 

.173(.113) 

.195^034) 

.187(.076) 

.197(.025) 

.193(.055) 

.2000018) 


r 

2.045(.241) 

2.035(.166) 

2.020(.112) 

2.011(.074) 


to 

3.525(1.325) 

3.362(1.258) 

3.326(1.041) 

3.167(.761) 

(M.2) 

a 

b 

.252(.227) 

.092^056) 

.290(.213) 

.097(.039) 

.296(.170) 

.098(.028) 

.319(.136) 

.100(.022) 


r 

1.563(.175) 

1.539(.129) 

1.520(.093) 

1.513(.066) 
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l-n(0mie) l-n(0true) l-n(0p^]e) Ln(0 


Model (M.1) 



Model (M.2) 



Figure 2: Boxplots of the differences of log-likelihood functions evaluated at 
the estimated MLE and the true value for Models (M.l) and (M.2) with 
sample sizes n = 2^, 2®, 2® and n = 2^^, respectively. The red “continuous” 
line indicates the position of zero. 
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Figure 3: Boxplots of the estimated MLE for Model (M.l) with sample sizes 
n = 2^, 2®, 2® and n = 2^^, respectively. The red “dashed” line indicates the 
true value of the parameter and the blue “x” indicates the location of the 
Monte Carlo mean of the MLE. 
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Figure 4: Same as Figure 3 but for Model (M.2). 


26 
























































































































6 Postponed proofs 


6.1 Convergence of the MLE 


Assumptions (A-1) and (A-2) are supposed to hold throughout this section. 
The proof of Theorem 3 relies on the approach introduced in [18], which 
was already used in [7] for a restricted class of observation-driven models. 
Our main contribution here is to provide the handy conditions listed in 
Assumption (B-4). We first show that our conditions imply (B-1) and the 
following one. 

(B-5) There exists xi € X such that, for all 9, 0* € 0, p^{Yi \ Y^oo-.o) defined as 
in (14) with x = xi is finite lP®*-a.s. Moreover, for all 0* G 0, we have 


lim sup 


, 9^{i^^{Yi:k-i){xi);Yk) 

pO(Yk\Y_^..k-i) 


= 0 


P®*-a.s. 


(41) 


Indeed we have the following lemma. 

Lemma 9. Assumptions fB-Sj, fB-3j and (B-4-) imply (B-5) and (B-1). 
Proof. See 6.3. □ 


Now the proof of Theorem 3 directly follows from the following lemma. 

Lemma 10. Assume that (B-2), (B-3) and (B-4)(i)-(ii) hold and that xi 
satisfies (B-5). Then 0* defined by (15) is a non-empty closed subset of Q 
and (16) holds. 

Proof. By [7, Theorem 33], to obtain (16), it is sufficient to show that, for 
all 0* G 0, the two following assertions hold. 

(a) E®* [sup 0 geln+p®(li 1 y_oo:o)] < oo , 


(b) the function 9 lnp^(Yi j IToco) is continuous on 0, P^*-a.s. 

In (B-5), p^{Yi \ y_oo:o) is defined lP^*-a.s. as the limit in (14) with x = xi. 
So, P®*-a.s., by (B-4)(i)-(ii), p^(Yi \ T_cxd:o) is bounded by the finite constant 
appearing in (B-4)(ii). Hence Condition (a) holds. 

Condition (b) then follows from (41). Since almost sure convergence im¬ 
plies the convergence in probability and P^* is shift invariant, the random 
sequence 


Um := sup 

0G0 


/(yiiy_oo:o) 


m G E+ , 


converges to zero in P®*-probability. Then there exists a subsequence 
of {Um) which converges P®*-a.s. to zero. Hence, interpreting this con¬ 
vergence as a uniform (in 9) convergence of \n.g^{fi^(Y-m-.o){xi)]Yi) to 
\\ip^{Yi \ y_oo:o) to conclude that (b) holds, it is sufficient to show that 
9 I—)• Ing^{fi^l(Y-m-.o){xi)]Yi) is continuous for all m P®*-a.s. This is indeed 
the case by (B-2) and (B-3) and since g^{x;y) is positive. □ 
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6.2 Ergodicity 

For proving Theorem 4, we first recall a more general set of conditions de¬ 
rived in [7], which are based on the following definition. 

Definition 11. Let (5 be a probability kernel from to ® T’({0,1}) 
satisfying the following marginal conditions, for all {x, x') € and B ^ y, 


G{{x, x'); B xY x{0, 1}) = G(x; B), 
G((x, x');Y xBx {0,1}) = G(x'; B) , 


(42) 


and such that the following coupling condition holds 

G((x,x');{(y,y) : y e Y} x {1}) = G((x, x');Y^ x {1}) . (43) 

Define the following quantities successively. 

• The trace measure of G{{x,x')] ■) on the set {{y,y) ■ y gY} x {1} is 
denoted by 

G{{x,xy,B) = Gi{x,xy,{{y,y) : yGB}x{l}), Bey. (44) 

• The probability kernel R from (X^jT®^) to (X^ x {0,1}, (g) 

P({0,1}) is defined for all x, x' G X^ and A G T®^ by 

R((x, x');A X {1}) = lAyyix),'iljy{x')) G{{x, x'); dy) . (45) 


The measurable function a from X^ to [0,1] is defined by 
a{x,x') = R{{x,x')]X^ X {1}) = G{{x,x')]Y‘^ x {1}) . 

The kernel R is defined for all (x, x') G X^ and A G T®^ by 

R{{x,x');Ax {1}) 


(46) 


R{{x,x')-, A) = I a{x,x') 
0 


if a{x, x') > 0, 
otherwise. 


(47) 


We can now introduce the so-called contracting condition which yields 
ergodicity. 

(A-9) There exists a kernel G yielding a and R as in Definition 11, a measurable 
function VF : X^ —)• [l,oo) satisfying Conditions (A-8)(ii) and (A-8)(iii) 
and real numbers {D, ( 1 X 2 , p) G (M+)^ x (0,1) such that for all (x, x') G 
X^ and, for all n > 1, 


i?"'((x, x'); d) < Dp'^d{x,x') , 

i?'"((x,x');d X W) < Dp'^d^yx,x')W‘^yx,x') 


(48) 

(49) 
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Under Conditions (A-3), (A-4), (A-5), (A-6) and (A-9) and by combining 
Theorem 6, Proposition 8 and Lemma 7 in [7], we immediately obtain the 
following result. 

Theorem 12. Assume (A-3), (A-4), (A-5), (A-6) and (A-9). Then the 
Markov kernel K admits a unique invariant distribution ir and 7 ri(U) < oo 
for any U : X ^ M+ such that V <V. 

Assumptions (A-3), (A-4), (A-5) and (A-6) are quite usual and easy to 
check. The key point to obtain ergodicity is thus to construct G satisfy¬ 
ing (A-9). For this, we can also rely on the following result which is quoted 
from [7, Lemma 9]. 

Lemma 13. Assume that there exists {p, jd) € (0,1) x M such that for all 
(x, x') € X^, 

.R ((x, x'); {(xi, x)^) G X^ : d(xi, x)^) > p d(x, x')}) = 0 , (50) 

RW<W + (3. (51) 


Then, (48) and (49) hold. 

Now we can prove that our set of conditions is sufficient. 

Proof of Theorem 4- We only need to show that (A-7) and (A-8) imply (A-9). 
We preface our proof by the following lemma. 

Lemma 14. Assume (A-8)(i). Then one can define a kernel G as 
in Definition 11 with the same a given in (46). Moreover, the kernel R 
defined by (45) satisfies, for all (x,x') G Y? such that a(x,x') > 0 and all 
measurable functions / : X^ —)• M+, 

= G{(j){x,x')-,f) with f{y) = f{'ify{x),ify{x')) . (52) 


Let us conclude the proof of Theorem 4 before proving this lemma. By 
Lemma 14 and Lemma 13, it remains to check that (50) and (51) hold for 
all {x,x') G X^. Observe that by definition of R, Condition (A-8)(iv) is 
equivalent to 

sup ('rVF(x,x') — VF(x,x')') < oo . 

(x,x')ex2 ^ ' 


so we can find /3 G M such that (51) holds for all (x,x') G X^. 

Now, let (x, x') G X^ and let (X, X') be distributed according to 
R((x,x');-) which is defined in (52). When x = x', then d{X,X') = 0, 
implying that Condition (50) holds with any nonnegative p. For x ^ x', let 
p be defined by 


P = 


d{ipy{x),f^y{x')) 

sup . . . 

{x,x' d(x,x ) 

x^x' 


(53) 
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which is in (0,1) by (A-7). Then 

d{X,X') _ d{i^Y{x)AY{x')) ^ 
d(x,x') d(x,x') “ 

Therefore, Condition (50) holds for all (x,x') € with p as in (53). □ 


We conclude this section with the postponed 


Proof of Lemma If. Let (x,x') G X^. We define G{{x.,x')] ■) as the distri¬ 
bution of (y, Y', U) drawn as follows. We first draw a random variable Y 
taking values in Y with density g{4>{x,x');-) with respect to v. Then we 
define {Y,Y',U) by separating the two cases, a(x,x') = 1 and a{x,x') < 1. 

• Suppose that a(x,x') = 1. Then from (A-8)(i), we have 
G(x; •) = G(x'; •) = G{(j){x, x'); •) . 

In this case, we set (Y, Y', U) = (T, Y, 1). 


• Suppose now that a{x,x') < 1. Then, using (22), the functions 
(1 - a(x, x'))“^ [g{x; ■) - a{x, x')g{(t>{x, x'); •)] 


and 

(1 - a{x, x'))~^ [g{x'; •) - a{x, x')g{4>{x, x'); •)] > 


are probability density functions with respect to u and we let A and 
A' be two independent random variables taking values in Y drawn 
with these two density functions, respectively. In this case we draw 
U independently according to a Bernoulli variable with mean a{x, x') 
and set 


(y,y') 


(y,y) ifc/ = i, 

(A, A') if [7 = 0. 


One can easily check that the so defined kernel G satisfies (42) and (43). 
Moreover, for all (x,x') G X^, 

G((x, x'); Y^ X {!}) = ¥{U = 1) = a(x, x') , 

which is compatible with (46). The kernel R is defined by setting R{{x, x')] •) 
as the conditional distribution of {X,X') = {tpy (x), i/jy {x')) given that U = 
1. To complete the proof of Lemma 14, observe that for any measurable 
/:X2^M+, we have, for all (x, x') G X^ such that a(x, x') > 0, 

M{x,x');f) = E [f{'4>y{x),'lfy{x')) \ U = l] 

= E [/(V'y(x),V'y(x'))] 

= G{(p{x,x')-J) , 


where f{y) = f{'il>y{x),'ify{x')) for all y G Y. 


□ 
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6.3 Proof of Lemma 9 


Under (A-2), Assumptions (B-4)(viii) implies that for all 0 G 0, 

A (ln+(^)) < oo > (54) 


and if moreover C > 0, 

A (A) < oo . (55) 

For proving Lemma 9, we will also make use of [7, Lemma 34] which we 
quote here for convenience. 


Lemma 15. Let {Un}nez+ be a stationary sequence of real-valued random 
variables on (P,J^,P). Assume that E(ln'''|17o|) < oo- Then, for all t] € 
( 0 , 1 ), 

lim AUk = 0 , P-a.s. 

fc—>-oo 


Proof of Lemma 9. We first show that p®(y|y_oo:o) in (14) is finite for x = xi 
]P^*-a.s. By (B-2), this follows by writing 


/ {yi I y-oo:o) = / 



(56) 


if, for all 0,0* G 0, the limit 


A(X-oo-.o) = Ihn A is well defined P®*-a.s. (57) 


For all 0 G 0, m > 0, X G X and y-m-.Q £ using (B-4)(iii), we have 

AA{y-m-.o){xi),A{y-m-.o){x)) < 0^^^ Ax) • (58) 

Taking x = Ay_,^_i{xi) and using (B-4)(v), we obtain, for all y-m-i-.o G 

d{A{y-m-.o){xi),A{y-m-l-.o){xi)) < 0(y_m_l) . 


Using (54) and Lemma 15, we have that 

V77G(0,1), <oo, P^*-a.s. , (59) 

fcez 

and thus (Y-m-.o){xi)) is a Cauchy sequence P®*-a.s. Its limit exists 

P^*-a.s., since (X, d) is assumed to be complete, which defines the X-valued 
random variable AA-oo-.o) for all 0, 0* G 0 when Y has distribution P®*-a.s. 
Thus (57) holds and we further obtain that 

k 

supd(V’®(Y'_A::o)(a;i),a;i) < sup ^ d{A{y-m-.o){xi),A A-m+lAixi)) 

0 G 0 ^ 60,^0 

— ^ A-m) < oo , P^*-a.s. (60) 

m>0 
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so that, letting A: —> oo, 

supd(V’®(y_oo:o),a;i) < V] e”" ^{Y-m) < oo , P®*-a.s. (61) 

Let us now prove (B-1). Relation (56) directly yields (B-l)(i). Let us 
prove (B-l)(ii), hence consider the case 9 = 0*. Using (58), we have 

d(/*(y_^:o)(xi),V''*(y-,„:o)(y-m)) < ^(y-m) P''*-a.S. 

Since {'ijj{X-m)}m>Q is stationary under P®*, it is bounded in probability, 
and since < 1, for all e > 0, we have 

Ji^P^* (d (/*(y_™,o)(^-m),/*(i^-m:o)(x)) > c) = 0 . (62) 

Note that for all m > 1, (y_m:o)(-^-m) = P^*-a.s., hence we get that 

/*(y_oo:o) =^i P''*-a.s. (63) 

To complete the proof of (B-l)(ii), we need to show that, under P®*, y i-)- 
/*(V^^*(y_oo:o); y) — g^*{Xi', y) is the conditional density of yi given y_oo: 0 ) 
that is, for any B £ y, 

I lB{y)g^* {Xi;y) u{dy) = P®* {Y,€B\ Y_^.,o) . 

Now, note that, by defintion of P®*, 

I 1b( y)/* (Xi; y) H^y) = P'* (yi G R | Xi) = P®* [Y^ £ B \ Xi, y_oo:o) • 

But since (63) implies that Xi is (T(y_oo:o)-™easurable, Xi can be removed 
in the last conditioning, which concludes the proof (B-l)(ii). 

Finally, it remains to show the uniform convergence (41) in (B-5). 
By (B-3) and (57), we have, for all 9,6^, £ Q, k £ Z+, 

/(y_oo:fc-l)=/(yi:A:-l)(/(y-oo;0)) , P"*-a.S. (64) 

From (B-4)(iii) and (64), we get 

d(V>'(yi,fc_l)(xi),V^^(y_oo;fe-l)) < (V''(y-oo:0)) , P'*-a.S. 

On the other hand (B-4)(iv) and (61) imply 

sup^ < oo , p'^*-a.s. , (65) 

6»ee ^ ^ 
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which, with the previous display, yields. 


SUpd(V'®(Yl:fc-l)(xi),?/^®(y_oo:A:-l)) = Ofc^-oo (/) lP®*-a.S. (66) 

eee ^ ' 


Since Xi is closed and satisfies Condition (B-4)(i), we have that, 
V’^(Xi:fc-i)(xi) and (Y-oo-.k-i) are in Xi for all k > 2. Thus Condi¬ 
tion (B-4)(vi) gives that 


sup 

0G0 


"" 9^ir{Y-oc:k-i);Yk) 


< Ak{l) X Ak{2) X ^fc(3) X Ak{A) 


-a.s. 


where 


Ak{l) = sw£>H (d(V’®(yi;fc_i)(xi), V’^(y_oo:fc-l))) 
eee ^ ^ 

6»ee 

Afc(3) = 

See 

Ak{A) = ^{Yk) . 


By (66) and (B-4)(vii), we have 


Ak{l) = Ok^oo P'*-a.s. 


With (59), this yields (41) in the case where (7 = 0. For (7 > 0, we further 
observe that, by (61) and (55), we have, for all 0* € 0 and k € Z+, 


E®* [ln+ Ak{2)\ < E®* 


CY,Q^y{Y_m+k-l) 

m>0 


cA* (</’) 

I - Q 


Then Lemma 15 implies that, P®*-a.s., Af^{2) = 0{r]~^) for any r] G (0,1). 
The same property applies similarly to Ak{3) by using (60) in place of (61). 
This yields (41) in the case where (7 > 0, which concludes the proof. □ 
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